5 research outputs found

    On monte carlo tree search and reinforcement learning

    Get PDF
    Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search

    Monte Carlo tree search strategies

    Get PDF
    Since their breakthrough in computer Go, Monte Carlo tree search (MCTS) methods have initiated almost a revolution in game-playing agents: the artificial intelligence (AI) community has since developed an enormous amount of MCTS variants and enhancements that advanced the state of the art not only in games, but also in several other domains. Although MCTS methods merge the generality of random sampling with the precision of tree search, their convergence rate can be relatively low in practice, especially when not aided by additional enhancements. This is why practitioners often combine them with expert or prior knowledge, heuristics, and handcrafted strategies. Despite the outstanding results (like the AlphaGo engine, which defeated the best human Go players, prodigiously overcoming this grand challenge of AI), such task-specific enhancements decrease the generality of many applied MCTS algorithms. Improving the performance of core MCTS methods, while retaining their generality and scalability, has proven difficult and is a current research challenge. This thesis presents a new approach for general improvement of MCTS methods and, at the same time, advances the fundamental theory behind MCTS by taking inspiration from the older and well-established field of reinforcement learning (RL). The links between MCTS, which is regarded as a search and planning framework, and the RL theory have already been outlined in the past; however, they have neither been thoroughly studied yet, nor have the existing studies significantly influenced the larger game AI community. Motivated by this, we re-examine in depth the close relation between the two fields and detail not only the similarities, but identify and emphasize also the differences between them. We present a practical way of extending MCTS methods with RL dynamics: we develop the temporal difference tree search (TDTS) framework, a novel class of MCTS-like algorithms that learn via temporal-differences (TD) instead of Monte Carlo sampling. This can be understood both as a generalization of MCTS with TD learning, as well as an extension of traditional TD learning methods with tree search and novel MCTS concepts. Through TDTS we show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We experiment with several such algorithms, focusing on an extension of the upper confidence bounds for trees (UCT) algorithm with the Sarsa(\lambda) algorithm – an on-policy TD learning method. Our evaluations confirm that MCTS-like methods inspired by RL dynamics demonstrate superior results on several classic board games and arcade video games: the new algorithms preserve robustness and computational complexity, while consistently outperforming their basic MCTS counterparts. Our findings encourage cross-fertilization between the game AI and RL communities, hopefully narrowing the gap between them, and promote a unified view of search and learning methods, recommending it as a promising research direction

    Reinforcement learning on the cart-pole problem

    Get PDF
    The main goal of this thesis was the evaluation and implementation of two types of reinforcement learning algorithms on a computer-simulated control problem. Reinforcement learning is a branch of machine learning which combines principles of dynamic programming and supervised learning for problem solving. For the benchmark system we chose the cart-pole control problem as it is widely used in this field for testing the efficiency of learning algorithms. Out of the reinforcement learning methods we chose two algorithms for temporal difference learning. This type of learning uses methods of dynamic programming and Monte Carlo methods. The first chosen algorithm is Q-learning, the second is an actor-critic algorithm which is called learning by associative search element and adaptive critic element. In the purpose of achieving our goal, we developed a computer application for the experimental testing of the simulation of learning on a benchmark system. Our aim was to make this tool as modular and reusable as possible. We defined a different method of performance evaluation which was used to evaluate both learning algorithms on a wide set of simulation parameters. We also measured the computational performance of both algorithms
    corecore